


GAWK(1)                 Utility Commands                  GAWK(1)



NAME
     gawk - pattern scanning and processing language

SYNOPSIS
     gawk [ -W _g_a_w_k-_o_p_t_i_o_n_s  ]  [  -F_f_s  ]  [  -v  _v_a_r=_v_a_l  ]  -f
     _p_r_o_g_r_a_m-_f_i_l_e [ -- ] file ...
     gawk [ -W _g_a_w_k-_o_p_t_i_o_n_s ] [ -F_f_s ] [ -v  _v_a_r=_v_a_l  ]  [  --  ]
     _p_r_o_g_r_a_m-_t_e_x_t file ...

DESCRIPTION
     _G_a_w_k is the GNU Project's implementation of the AWK program-
     ming  language.   It  conforms  to  the  definition  of  the
     language in the POSIX 1003.2 Command Language And  Utilities
     Standard  (draft  11).  This version in turn is based on the
     description in _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e,  by  Aho,  Ker-
     nighan, and Weinberger, with the additional features defined
     in the System V Release 4 version of UNIX  _a_w_k.   _G_a_w_k  also
     provides some GNU-specific extensions.

     The command line consists of options to _g_a_w_k itself, the AWK
     program text (if not supplied via the -f option), and values
     to be made available in the ARGC and  ARGV  pre-defined  AWK
     variables.

OPTIONS
     _G_a_w_k accepts the following options, which should  be  avail-
     able on any implementation of the AWK language.

     -F_f_s Use _f_s for the input field separator (the value of  the
          FS predefined variable).

     -v _v_a_r=_v_a_l
          Assign the value _v_a_l, to the variable _v_a_r, before  exe-
          cution of the program begins.  Such variable values are
          available to the BEGIN block of an AWK program.

     -f _p_r_o_g_r_a_m-_f_i_l_e
          Read the AWK program source from the file _p_r_o_g_r_a_m-_f_i_l_e,
          instead  of from the first command line argument.  Mul-
          tiple -f options may be used.

     --   Signal the end of options.  This  is  useful  to  allow
          further  arguments  to  the AWK program itself to start
          with a ``-''.  This is mainly for consistency with  the
          argument  parsing  convention  used by most other POSIX
          programs.

     Following the POSIX standard, _g_a_w_k-specific options are sup-
     plied  via  arguments to the -W option.  Multiple -W options
     may be supplied,  or  multiple  arguments  may  be  supplied
     together  if  they  are  separated by commas, or enclosed in
     quotes and separated by white space.   Case  is  ignored  in



Free Software FoundationLast change: Jun 5 1991                    1






GAWK(1)                 Utility Commands                  GAWK(1)



     arguments to the -W option.

     The -W option accepts the following arguments:

     compat    Run in _c_o_m_p_a_t_i_b_i_l_i_t_y mode.  In compatibility mode,
               _g_a_w_k  behaves identically to UNIX _a_w_k; none of the
               GNU-specific extensions are recognized.

     copyleft
     copyright Print the  short  version  of  the  GNU  copyright
               information message on the error output.

     lint      Provide warnings about constructs that are dubious
               or non-portable to other AWK implementations.

     posix     This turns on _c_o_m_p_a_t_i_b_i_l_i_t_y mode, with the follow-
               ing additional restrictions:

               o+ \x escape sequences are not recognized.

               o+ The synonym func for the keyword function is not
                 recognized.

               o+ The operators ** and **= cannot be used in place
                 of ^ and ^=.

     version   Print version information for this particular copy
               of  _g_a_w_k  on  the  error  output.   This is useful
               mainly for knowing if the current copy of _g_a_w_k  on
               your system is up to date with respect to whatever
               the Free Software Foundation is distributing.

     Any other options are flagged as illegal, but are  otherwise
     ignored.

AWK PROGRAM EXECUTION
     An AWK program consists  of  a  sequence  of  pattern-action
     statements and optional function definitions.

          _p_a_t_t_e_r_n   { _a_c_t_i_o_n _s_t_a_t_e_m_e_n_t_s }
          function _n_a_m_e(_p_a_r_a_m_e_t_e_r _l_i_s_t) { _s_t_a_t_e_m_e_n_t_s }

     _G_a_w_k first reads the program source from the _p_r_o_g_r_a_m-_f_i_l_e(s)
     if  specified,  or from the first non-option argument on the
     command line.  The -f option may be used multiple  times  on
     the command line.  _G_a_w_k will read the program text as if all
     the _p_r_o_g_r_a_m-_f_i_l_es had been concatenated together.   This  is
     useful for building libraries of AWK functions, without hav-
     ing to include them in each new AWK program that uses  them.
     To  use a library function in a file from a program typed in
     on  the  command  line,  specify  /dev/tty  as  one  of  the
     _p_r_o_g_r_a_m-_f_i_l_es,  type  your  program,  and  end  it with a ^D



Free Software FoundationLast change: Jun 5 1991                    2






GAWK(1)                 Utility Commands                  GAWK(1)



     (control-d).

     The environment variable AWKPATH specifies a search path  to
     use  when finding source files named with the -f option.  If
     this  variable  does  not  exist,  the   default   path   is
     ".:/usr/lib/awk:/usr/local/lib/awk".   If  a file name given
     to the -f option contains a ``/'' character, no path  search
     is performed.

     _G_a_w_k executes AWK programs in the following  order.   First,
     _g_a_w_k  compiles the program into an internal form.  Next, all
     variable assignments specified via the -v  option  are  per-
     formed.   Then, _g_a_w_k executes the code in the BEGIN block(s)
     (if any), and then proceeds to read each file named  in  the
     ARGV  array.   If  there  are  no files named on the command
     line, _g_a_w_k reads the standard input.

     If a filename on the command line has the form _v_a_r=_v_a_l it is
     treated  as  a variable assignment. The variable _v_a_r will be
     assigned the value  _v_a_l.   (This  happens  after  any  BEGIN
     block(s) have been run.) Command line variable assignment is
     most useful for dynamically assigning values  to  the  vari-
     ables  AWK  uses  to control how input is broken into fields
     and records. It is also useful for controlling state if mul-
     tiple passes are needed over a single data file.

     If the value of a particular element of ARGV is empty  (""),
     _g_a_w_k skips over it.

     For each line in the input, _g_a_w_k tests to see if it  matches
     any  _p_a_t_t_e_r_n  in the AWK program.  For each pattern that the
     line matches, the associated _a_c_t_i_o_n is executed.   The  pat-
     terns are tested in the order they occur in the program.

     Finally, after all the input is exhausted, _g_a_w_k executes the
     code in the END block(s) (if any).

VARIABLES AND FIELDS
     AWK variables are dynamic; they  come  into  existence  when
     they  are first used. Their values are either floating-point
     numbers or strings, or both, depending  upon  how  they  are
     used.  AWK  also  has  one dimension arrays; multiply dimen-
     sioned arrays may be simulated.  Several  pre-defined  vari-
     ables  are set as a program runs; these will be described as
     needed and summarized below.

  Fields
     As each input line  is  read,  _g_a_w_k  splits  the  line  into
     _f_i_e_l_d_s,  using  the  value  of  the FS variable as the field
     separator.   If  FS  is  a  single  character,  fields   are
     separated  by  that character.  Otherwise, FS is expected to
     be a full regular expression.  In the special case  that  FS



Free Software FoundationLast change: Jun 5 1991                    3






GAWK(1)                 Utility Commands                  GAWK(1)



     is  a  single  blank, fields are separated by runs of blanks
     and/or tabs.  Note that the value of IGNORECASE (see  below)
     will  also  affect how fields are split when FS is a regular
     expression.

     If the FIELDWIDTHS variable is set to a space separated list
     of  numbers, each field is expected to have fixed width, and
     _g_a_w_k will split up the record using  the  specified  widths.
     The  value  of  FS  is ignored.  Assigning a new value to FS
     overrides the use of FIELDWIDTHS, and restores  the  default
     behaviour.

     Each field in the input line may be referenced by its  posi-
     tion, $1, $2, and so on.  $0 is the whole line. The value of
     a field may be assigned to as  well.   Fields  need  not  be
     referenced by constants:

          n = 5
          print $n

     prints the fifth field in the input line.  The  variable  NF
     is set to the total number of fields in the input line.

     References to non-existent fields (i.e.  fields  after  $NF)
     produce  the  null-string.  However,  assigning  to  a  non-
     existent field (e.g., $(NF+2) = 5) will increase  the  value
     of NF, create any intervening fields with the null string as
     their value, and cause the value of  $0  to  be  recomputed,
     with the fields being separated by the value of OFS.

  Built-in Variables
     AWK's built-in variables are:

     ARGC        The number of command line arguments  (does  not
                 include options to _g_a_w_k, or the program source).

     ARGV        Array of command line arguments.  The  array  is
                 indexed  from 0 to ARGC - 1.  Dynamically chang-
                 ing the contents of ARGV can control  the  files
                 used for data.

     CONVFMT     The conversion format for  numbers,  "%.6g",  by
                 default.

     ENVIRON     An array containing the values  of  the  current
                 environment.    The  array  is  indexed  by  the
                 environment variables, each  element  being  the
                 value  of  that  variable (e.g., ENVIRON["HOME"]
                 might be /u/arnold).  Changing this  array  does
                 not  affect  the  environment  seen  by programs
                 which _g_a_w_k spawns via redirection  or  the  sys-
                 tem()  function.   (This  may change in a future



Free Software FoundationLast change: Jun 5 1991                    4






GAWK(1)                 Utility Commands                  GAWK(1)



                 version of _g_a_w_k.)

     FIELDWIDTHS A white-space  separated  list  of  fieldwidths.
                 When  set,  _g_a_w_k parses the input into fields of
                 fixed width, instead of using the value  of  the
                 FS  variable  as the field separator.  The fixed
                 field  width  facility  is  still  experimental;
                 expect  the  semantics to change as _g_a_w_k evolves
                 over time.

     FILENAME    The name of the current input file.  If no files
                 are  specified on the command line, the value of
                 FILENAME is ``-''.

     FNR         The input record number  in  the  current  input
                 file.

     FS          The input field separator, a blank by default.

     IGNORECASE  Controls the  case-sensitivity  of  all  regular
                 expression  operations. If IGNORECASE has a non-
                 zero value,  then  pattern  matching  in  rules,
                 field  splitting  with  FS,  regular  expression
                 matching with ~ and !~, and the gsub(), index(),
                 match(),  split(),  and  sub() pre-defined func-
                 tions will all ignore case  when  doing  regular
                 expression  operations.   Thus, if IGNORECASE is
                 not equal to  zero,  /aB/  matches  all  of  the
                 strings "ab", "aB", "Ab", and "AB".  As with all
                 AWK variables, the initial value  of  IGNORECASE
                 is  zero,  so  all regular expression operations
                 are normally case-sensitive.

     NF          The  number  of  fields  in  the  current  input
                 record.

     NR          The total number of input records seen so far.

     OFMT        The  output  format  for  numbers,  "%.6g",   by
                 default.

     OFS         The output field separator, a blank by default.

     ORS         The output record separator, by default  a  new-
                 line.

     RS          The input record separator, by  default  a  new-
                 line.   RS is exceptional in that only the first
                 character  of  its  string  value  is  used  for
                 separating  records.  (This will probably change
                 in a future release of _g_a_w_k.) If RS  is  set  to
                 the  null  string, then records are separated by



Free Software FoundationLast change: Jun 5 1991                    5






GAWK(1)                 Utility Commands                  GAWK(1)



                 blank lines.  When RS is set to the null string,
                 then  the  newline  character  always  acts as a
                 field separator, in addition to  whatever  value
                 FS may have.

     RSTART      The index of  the  first  character  matched  by
                 match(); 0 if no match.

     RLENGTH     The length of the string matched by match();  -1
                 if no match.

     SUBSEP      The character used  to  separate  multiple  sub-
                 scripts in array elements, by default "\034".

  Arrays
     Arrays are subscripted with  an  expression  between  square
     brackets ([ and ]).  If the expression is an expression list
     (_e_x_p_r, _e_x_p_r ...) then the array subscript is a  string  con-
     sisting  of  the concatenation of the (string) value of each
     expression, separated by the value of the  SUBSEP  variable.
     This  facility  is  used  to  simulate  multiply dimensioned
     arrays. For example:

          i = "A" ; j = "B" ; k = "C"
          x[i, j, k] = "hello, world\n"

     assigns the string "hello, world\n" to the  element  of  the
     array  x  which  is indexed by the string "A\034B\034C". All
     arrays in  AWK  are  associative,  i.e.  indexed  by  string
     values.

     The special operator in may be used in an if or while state-
     ment to see if an array has an index consisting of a partic-
     ular value.

          if (val in array)
               print array[val]

     If the array has multiple subscripts, use (i, j) in array.

     The in construct may also be used in a for loop  to  iterate
     over all the elements of an array.

     An element may be deleted from an  array  using  the  delete
     statement.

  Variable Typing And Conversion
     Variables and fields may be  (floating  point)  numbers,  or
     strings, or both. How the value of a variable is interpreted
     depends upon its context. If used in a  numeric  expression,
     it  will be treated as a number, if used as a string it will
     be treated as a string.



Free Software FoundationLast change: Jun 5 1991                    6






GAWK(1)                 Utility Commands                  GAWK(1)



     To force a variable to be treated as a number, add 0 to  it;
     to  force  it to be treated as a string, concatenate it with
     the null string.

     When a string must be converted to a number, the  conversion
     is  accomplished  using _a_t_o_f(3).  A number is converted to a
     string by using the value of CONVFMT as a format string  for
     _s_p_r_i_n_t_f(3),  with  the  numeric value of the variable as the
     argument.  However, even  though  all  numbers  in  AWK  are
     floating-point,  integral  values  are  _a_l_w_a_y_s  converted as
     integers.  Thus, given

          CONVFMT = "%2.2f"
          a = 12
          b = a ""

     the variable b has a value of "12" and not "12.00".

     _G_a_w_k performs comparisons as follows: If two  variables  are
     numeric,  they  are  compared  numerically.  If one value is
     numeric and the other has a string value that is a ``numeric
     string,''  then comparisons are also done numerically.  Oth-
     erwise, the numeric value is converted to  a  string  and  a
     string  comparison  is performed.  Two strings are compared,
     of course, as strings.   According  to  the  POSIX  standard
     (draft  11),  even  if  two  strings  are numeric strings, a
     numeric comparison is performed.  However, this  is  clearly
     incorrect, and _g_a_w_k does not do this.

     Uninitialized variables have the numeric  value  0  and  the
     string value "" (the null, or empty, string).

PATTERNS AND ACTIONS
     AWK is a line oriented language. The  pattern  comes  first,
     and then the action. Action statements are enclosed in { and
     }.  Either the pattern may be missing, or the action may  be
     missing,  but,  of course, not both. If the pattern is miss-
     ing, the action will be executed for every  single  line  of
     input.  A missing action is equivalent to

          { print }

     which prints the entire line.

     Comments begin with the ``#'' character, and continue  until
     the  end  of  the line.  Blank lines may be used to separate
     statements.  Normally, a statement ends with a newline, how-
     ever,  this  is  not  the  case for lines ending in a ``,'',
     ``{'', ``?'', ``:'', ``&&'', or ``||''.  Lines ending in  do
     or  else  also have their statements automatically continued
     on the following line.  In other cases, a line can  be  con-
     tinued  by ending it with a ``\'', in which case the newline



Free Software FoundationLast change: Jun 5 1991                    7






GAWK(1)                 Utility Commands                  GAWK(1)



     will be ignored.

     Multiple statements may be put on  one  line  by  separating
     them  with  a  ``;''.   This  applies to both the statements
     within the action part of a pattern-action pair  (the  usual
     case), and to the pattern-action statements themselves.

  Patterns
     AWK patterns may be one of the following:

          BEGIN
          END
          /_r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n/
          _r_e_l_a_t_i_o_n_a_l _e_x_p_r_e_s_s_i_o_n
          _p_a_t_t_e_r_n && _p_a_t_t_e_r_n
          _p_a_t_t_e_r_n || _p_a_t_t_e_r_n
          _p_a_t_t_e_r_n ? _p_a_t_t_e_r_n : _p_a_t_t_e_r_n
          (_p_a_t_t_e_r_n)
          ! _p_a_t_t_e_r_n
          _p_a_t_t_e_r_n_1, _p_a_t_t_e_r_n_2

     BEGIN and END are two special kinds of  patterns  which  are
     not tested against the input.  The action parts of all BEGIN
     patterns are merged as if all the statements had been  writ-
     ten in a single BEGIN block. They are executed before any of
     the input is read. Similarly, all the END blocks are merged,
     and  executed  when  all  the input is exhausted (or when an
     exit statement is executed).  BEGIN and END patterns  cannot
     be  combined  with  other  patterns  in pattern expressions.
     BEGIN and END patterns cannot have missing action parts.

     For /_r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n/ patterns, the associated  statement
     is  executed  for  each  input line that matches the regular
     expression.  Regular expressions are the same  as  those  in
     _e_g_r_e_p(1), and are summarized below.

     A _r_e_l_a_t_i_o_n_a_l _e_x_p_r_e_s_s_i_o_n may use any of the operators defined
     below  in  the  section  on  actions.   These generally test
     whether certain fields match certain regular expressions.

     The &&, ||, and ! operators are logical AND, logical OR, and
     logical  NOT,  respectively, as in C.  They do short-circuit
     evaluation, also as in C, and are used  for  combining  more
     primitive   pattern   expressions.  As  in  most  languages,
     parentheses may be used to change the order of evaluation.

     The ?: operator is like the same operator in C. If the first
     pattern  is  true  then  the pattern used for testing is the
     second pattern, otherwise it is the third. Only one  of  the
     second and third patterns is evaluated.





Free Software FoundationLast change: Jun 5 1991                    8






GAWK(1)                 Utility Commands                  GAWK(1)



     The _p_a_t_t_e_r_n_1, _p_a_t_t_e_r_n_2 form of an  expression  is  called  a
     range pattern.  It matches all input records starting with a
     line that matches _p_a_t_t_e_r_n_1, and continuing  until  a  record
     that  matches  _p_a_t_t_e_r_n_2, inclusive. It does not combine with
     any other sort of pattern expression.

  Regular Expressions
     Regular expressions are the extended kind  found  in  _e_g_r_e_p.
     They are composed of characters as follows:

     _c         matches the non-metacharacter _c.

     _\_c        matches the literal character _c.

     .         matches any character except newline.

     ^         matches the beginning of a line or a string.

     $         matches the end of a line or a string.

     [_a_b_c...]  character class, matches  any  of  the  characters
               _a_b_c....

     [^_a_b_c...] negated character  class,  matches  any  character
               except _a_b_c... and newline.

     _r_1|_r_2     alternation: matches either _r_1 or _r_2.

     _r_1_r_2      concatenation: matches _r_1, and then _r_2.

     _r+        matches one or more _r's.

     _r*        matches zero or more _r's.

     _r?        matches zero or one _r's.

     (_r)       grouping: matches _r.

     The escape sequences that are valid in string constants (see
     below) are also legal in regular expressions.

  Actions
     Action statements are enclosed in braces, { and  }.   Action
     statements consist of the usual assignment, conditional, and
     looping statements found in most languages.  The  operators,
     control  statements,  and  input/output statements available
     are patterned after those in C.

  Operators
     The operators in AWK, in order of increasing precedence, are





Free Software FoundationLast change: Jun 5 1991                    9






GAWK(1)                 Utility Commands                  GAWK(1)



     = += -=
     *= /= %= ^= Assignment.  Both  absolute  assignment  (_v_a_r  =
                 _v_a_l_u_e) and operator-assignment (the other forms)
                 are supported.

     ?:          The C conditional expression. This has the  form
                 _e_x_p_r_1  ?  _e_x_p_r_2  :  _e_x_p_r_3. If _e_x_p_r_1 is true, the
                 value of the expression is _e_x_p_r_2,  otherwise  it
                 is  _e_x_p_r_3.   Only  one  of  _e_x_p_r_2  and  _e_x_p_r_3 is
                 evaluated.

     ||          Logical OR.

     &&          Logical AND.

     ~ !~        Regular expression match, negated match.   NOTE:
                 Do not use a constant regular expression (/foo/)
                 on the left-hand side of a ~ or  !~.   Only  use
                 one  on  the  right-hand  side.   The expression
                 /foo/ ~ _e_x_p has  the  same  meaning  as  (($0  ~
                 /foo/)  ~  _e_x_p).   This  is usually _n_o_t what was
                 intended.

     < >
     <= >=
     != ==       The regular relational operators.

     _b_l_a_n_k       String concatenation.

     + -         Addition and subtraction.

     * / %       Multiplication, division, and modulus.

     + - !       Unary plus, unary minus, and logical negation.

     ^           Exponentiation (** may also be used, and **= for
                 the assignment operator).

     ++ --       Increment and decrement, both prefix  and  post-
                 fix.

     $           Field reference.

  Control Statements
     The control statements are as follows:

          if (_c_o_n_d_i_t_i_o_n) _s_t_a_t_e_m_e_n_t [ else _s_t_a_t_e_m_e_n_t ]
          while (_c_o_n_d_i_t_i_o_n) _s_t_a_t_e_m_e_n_t
          do _s_t_a_t_e_m_e_n_t while (_c_o_n_d_i_t_i_o_n)
          for (_e_x_p_r_1; _e_x_p_r_2; _e_x_p_r_3) _s_t_a_t_e_m_e_n_t
          for (_v_a_r in _a_r_r_a_y) _s_t_a_t_e_m_e_n_t
          break



Free Software FoundationLast change: Jun 5 1991                   10






GAWK(1)                 Utility Commands                  GAWK(1)



          continue
          delete _a_r_r_a_y[_i_n_d_e_x]
          exit [ _e_x_p_r_e_s_s_i_o_n ]
          { _s_t_a_t_e_m_e_n_t_s }

  I/O Statements
     The input/output statements are as follows:

     close(_f_i_l_e_n_a_m_e)       Close file (or pipe, see below).

     getline               Set $0 from next input record; set NF,
                           NR, FNR.

     getline <_f_i_l_e         Set $0 from next record of  _f_i_l_e;  set
                           NF.

     getline _v_a_r           Set _v_a_r from next  input  record;  set
                           NF, FNR.

     getline _v_a_r <_f_i_l_e     Set _v_a_r from next record of _f_i_l_e.

     next                  Stop  processing  the  current   input
                           record.  The next input record is read
                           and processing starts  over  with  the
                           first  pattern  in the AWK program. If
                           the end of the input data is  reached,
                           the  END  block(s),  if  any, are exe-
                           cuted.

     print                 Prints the current record.

     print _e_x_p_r-_l_i_s_t       Prints expressions.

     print _e_x_p_r-_l_i_s_t >_f_i_l_e Prints expressions on _f_i_l_e.

     printf _f_m_t, _e_x_p_r-_l_i_s_t Format and print.

     printf _f_m_t, _e_x_p_r-_l_i_s_t >_f_i_l_e
                           Format and print on _f_i_l_e.

     system(_c_m_d-_l_i_n_e)      Execute  the  command  _c_m_d-_l_i_n_e,   and
                           return the exit status.  (This may not
                           be available on non-POSIX systems.)

     Other input/output redirections are also allowed. For  print
     and  printf, >>_f_i_l_e appends output to the _f_i_l_e, while | _c_o_m_-
     _m_a_n_d writes on a pipe.  In a similar fashion, _c_o_m_m_a_n_d | get-
     line  pipes  into  getline.  Getline will return 0 on end of
     file, and -1 on an error.

  The _p_r_i_n_t_f Statement




Free Software FoundationLast change: Jun 5 1991                   11






GAWK(1)                 Utility Commands                  GAWK(1)



     The AWK versions of the printf statement and sprintf() func-
     tion  (see below) accept the following conversion specifica-
     tion formats:

     %c   An ASCII character.  If the argument  used  for  %c  is
          numeric,  it  is  treated  as  a character and printed.
          Otherwise, the argument is assumed to be a string,  and
          the only first character of that string is printed.

     %d   A decimal number (the integer part).

     %i   Just like %d.

     %e   A floating point number of the form [-]d.ddddddE[+-]dd.

     %f   A floating point number of the form [-]ddd.dddddd.

     %g   Use e or f conversion, whichever is shorter, with  non-
          significant zeros suppressed.

     %o   An unsigned octal number (again, an integer).

     %s   A character string.

     %x   An unsigned hexadecimal number (an integer).

     %X   Like %x, but using ABCDEF instead of abcdef.

     %%   A single % character; no argument is converted.

     There are  optional,  additional  parameters  that  may  lie
     between the % and the control letter:

     -    The expression  should  be  left-justified  within  its
          field.

     _w_i_d_t_h
          The field should be padded to this width. If the number
          has  a leading zero, then the field will be padded with
          zeros.  Otherwise it is padded with blanks.

     ._p_r_e_c
          A number indicating the maximum  width  of  strings  or
          digits to the right of the decimal point.

     The dynamic _w_i_d_t_h  and  _p_r_e_c  capabilities  of  the  ANSI  C
     printf() routines are supported.  A * in place of either the
     width or prec specifications will cause their values  to  be
     taken from the argument list to printf or sprintf().

  Special File Names




Free Software FoundationLast change: Jun 5 1991                   12






GAWK(1)                 Utility Commands                  GAWK(1)



     When doing I/O redirection from either print or printf  into
     a  file, or via getline from a file, _g_a_w_k recognizes certain
     special filenames internally.  These filenames allow  access
     to  open  file descriptors inherited from _g_a_w_k's parent pro-
     cess (usually the shell).  The filenames are:

     /dev/stdin
               The standard input.

     /dev/stdout
               The standard output.

     /dev/stderr
               The standard error output.

     /dev/fd/_n The file denoted by the open file descriptor _n.

     These are particularly useful for error messages. For  exam-
     ple:

          print "You blew it!" > "/dev/stderr"

     whereas you would otherwise have to use

          print "You blew it!" | "cat 1>&2"

     These file names may also be used on  the  command  line  to
     name data files.

  Numeric Functions
     AWK has the following pre-defined arithmetic functions:

     atan2(_y, _x) returns the arctangent of _y/_x in radians.

     cos(_e_x_p_r)   returns the cosine in radians.

     exp(_e_x_p_r)   the exponential function.

     int(_e_x_p_r)   truncates to integer.

     log(_e_x_p_r)   the natural logarithm function.

     rand()      returns a random number between 0 and 1.

     sin(_e_x_p_r)   returns the sine in radians.

     sqrt(_e_x_p_r)  the square root function.

     srand(_e_x_p_r) use _e_x_p_r as a new seed  for  the  random  number
                 generator.  If  no _e_x_p_r is provided, the time of
                 day will be used.  The return value is the  pre-
                 vious seed for the random number generator.



Free Software FoundationLast change: Jun 5 1991                   13






GAWK(1)                 Utility Commands                  GAWK(1)



  String Functions
     AWK has the following pre-defined string functions:

     gsub(_r, _s, _t)           for each substring matching the reg-
                             ular  expression  _r in the string _t,
                             substitute the string _s, and  return
                             the  number  of substitutions.  If _t
                             is not supplied, use $0.

     index(_s, _t)             returns the index of the string _t in
                             the  string  _s,  or  0  if  _t is not
                             present.

     length(_s)               returns the length of the string  _s,
                             or the length of $0 if _s is not sup-
                             plied.

     match(_s, _r)             returns the position in _s where  the
                             regular expression _r occurs, or 0 if
                             _r  is  not  present,  and  sets  the
                             values of RSTART and RLENGTH.

     split(_s, _a, _r)          splits the string _s into the array _a
                             on  the  regular  expression  _r, and
                             returns the number of fields.  If  _r
                             is omitted, FS is used instead.

     sprintf(_f_m_t, _e_x_p_r-_l_i_s_t) prints _e_x_p_r-_l_i_s_t according  to  _f_m_t,
                             and returns the resulting string.

     sub(_r, _s, _t)            just like gsub(), but only the first
                             matching substring is replaced.

     substr(_s, _i, _n)         returns the _n-character substring of
                             _s  starting  at _i.  If _n is omitted,
                             the rest of _s is used.

     tolower(_s_t_r)            returns a copy of  the  string  _s_t_r,
                             with  all  the upper-case characters
                             in   _s_t_r   translated    to    their
                             corresponding   lower-case  counter-
                             parts.   Non-alphabetic   characters
                             are left unchanged.

     toupper(_s_t_r)            returns a copy of  the  string  _s_t_r,
                             with  all  the lower-case characters
                             in   _s_t_r   translated    to    their
                             corresponding   upper-case  counter-
                             parts.   Non-alphabetic   characters
                             are left unchanged.





Free Software FoundationLast change: Jun 5 1991                   14






GAWK(1)                 Utility Commands                  GAWK(1)



  Time Functions
     Since one of the primary uses of AWK programs in  processing
     log files that contain time stamp information, _g_a_w_k provides
     the following two functions for obtaining  time  stamps  and
     formatting them.

     systime() returns the current time of day as the  number  of
               seconds  since the Epoch (Midnight UTC, January 1,
               1970 on POSIX systems).

     strftime(_f_o_r_m_a_t, _t_i_m_e_s_t_a_m_p)
               formats _t_i_m_e_s_t_a_m_p according to  the  specification
               in  _f_o_r_m_a_t.   The  _t_i_m_e_s_t_a_m_p should be of the same
               form as returned by systime().   If  _t_i_m_e_s_t_a_m_p  is
               missing, the current time of day is used.  See the
               specification for the strftime() function in  ANSI
               C  for  the format conversions that are guaranteed
               to  be  available.   A  public-domain  version  of
               _s_t_r_f_t_i_m_e(3) and a man page for it are shipped with
               _g_a_w_k; if that version was used to build _g_a_w_k, then
               all  of the conversions described in that man page
               are available to _g_a_w_k.

  String Constants
     String constants in AWK are sequences of characters enclosed
     between  double  quotes  ("). Within strings, certain _e_s_c_a_p_e
     _s_e_q_u_e_n_c_e_s are recognized, as in C. These are:

     \\   A literal backslash.

     \a   The ``alert'' character; usually the ASCII BEL  charac-
          ter.

     \b   backspace.

     \f   form-feed.

     \n   new line.

     \r   carriage return.

     \t   horizontal tab.

     \v   vertical tab.

     \x_h_e_x _d_i_g_i_t_s
          The character represented by the string of  hexadecimal
          digits  following  the \x.  As in ANSI C, all following
          hexadecimal digits are considered part  of  the  escape
          sequence.  (This feature should tell us something about
          language design by  committee.)  E.g.,  "\x1B"  is  the
          ASCII ESC (escape) character.



Free Software FoundationLast change: Jun 5 1991                   15






GAWK(1)                 Utility Commands                  GAWK(1)



     \_d_d_d The character represented by the  1-,  2-,  or  3-digit
          sequence  of octal digits. E.g. "\033" is the ASCII ESC
          (escape) character.

     \_c   The literal character _c.

     The escape sequences may also be used inside constant  regu-
     lar  expressions  (e.g.,  /[ \t\f\n\r\v]/ matches whitespace
     characters).

FUNCTIONS
     Functions in AWK are defined as follows:

          function _n_a_m_e(_p_a_r_a_m_e_t_e_r _l_i_s_t) { _s_t_a_t_e_m_e_n_t_s }

     Functions are executed when called from  within  the  action
     parts  of  regular pattern-action statements. Actual parame-
     ters supplied in the function call are used  to  instantiate
     the  formal parameters declared in the function.  Arrays are
     passed by reference, other variables are passed by value.

     Since  functions  were  not  originally  part  of  the   AWK
     language,  the  provision  for  local  variables  is  rather
     clumsy: They are declared as extra parameters in the parame-
     ter list. The convention is to separate local variables from
     real parameters by extra spaces in the parameter  list.  For
     example:

          function  f(p, q,     a, b) { # a & b are local
                         ..... }

          /abc/     { ... ; f(1, 2) ; ... }

     The left parenthesis in  a  function  call  is  required  to
     immediately  follow the function name, without any interven-
     ing white space.  This is to  avoid  a  syntactic  ambiguity
     with  the concatenation operator.  This restriction does not
     apply to the built-in functions listed above.

     Functions may call each other and may be  recursive.   Func-
     tion  parameters  used as local variables are initialized to
     the null string and the number zero  upon  function  invoca-
     tion.

     The word func may be used in place of function.

EXAMPLES
     Print and sort the login names of all users:

          BEGIN     { FS = ":" }
               { print $1 | "sort" }




Free Software FoundationLast change: Jun 5 1991                   16






GAWK(1)                 Utility Commands                  GAWK(1)



     Count lines in a file:

               { nlines++ }
          END  { print nlines }

     Precede each line by its number in the file:

          { print FNR, $0 }

     Concatenate and line number (a variation on a theme):

          { print NR, $0 }

SEE ALSO
     _e_g_r_e_p(1)

     _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, Alfred V. Aho, Brian  W.  Ker-
     nighan,  Peter  J. Weinberger, Addison-Wesley, 1988. ISBN 0-
     201-07981-X.

     _T_h_e _G_A_W_K _M_a_n_u_a_l, published by the Free Software  Foundation,
     1991.

POSIX COMPATIBILITY
     A primary goal for _g_a_w_k  is  compatibility  with  the  POSIX
     standard,  as  well  as with the latest version of UNIX _a_w_k.
     To this end, _g_a_w_k incorporates the  following  user  visible
     features  which  are  not described in the AWK book, but are
     part of _a_w_k in System V Release 4,  and  are  in  the  POSIX
     standard.

     The -v option for assigning variables before program  execu-
     tion  starts  is  new.  The book indicates that command line
     variable assignment happens when _a_w_k  would  otherwise  open
     the  argument  as  a file, which is after the BEGIN block is
     executed.  However, in earlier implementations, when such an
     assignment  appeared  before  any file names, the assignment
     would happen _b_e_f_o_r_e the BEGIN block was  run.   Applications
     came  to depend on this ``feature.'' When _a_w_k was changed to
     match its documentation, this option was added to accomodate
     applications  that  depended  upon the old behaviour.  (This
     feature was agreed upon by both the AT&T  and  GNU  develop-
     ers.)

     The -W option for implementation specific features  is  from
     the POSIX standard.

     When processing arguments,  _g_a_w_k  uses  the  special  option
     ``--''  to signal the end of arguments, and warns about, but
     otherwise ignores, undefined options.





Free Software FoundationLast change: Jun 5 1991                   17






GAWK(1)                 Utility Commands                  GAWK(1)



     The AWK book does not define the return  value  of  srand().
     The  System  V  Release 4 version of UNIX _a_w_k (and the POSIX
     standard) has it return the seed  it  was  using,  to  allow
     keeping  track of random number sequences. Therefore srand()
     in _g_a_w_k also returns its current seed.

     Other new features are: The use of multiple -f options (from
     MKS _a_w_k); the ENVIRON array; the \a, and \v escape sequences
     (done originally in _g_a_w_k and  fed  back  into  AT&T's);  the
     tolower()  and toupper() built-in functions (from AT&T); and
     the ANSI C conversion specifications in printf  (done  first
     in AT&T's version).

GNU EXTENSIONS
     _G_a_w_k has some extensions to POSIX _a_w_k.  They  are  described
     in  this  section.  All the extensions described here can be
     disabled by invoking _g_a_w_k with the -W compat option.

     The following features of _g_a_w_k are not  available  in  POSIX
     _a_w_k.

          o+ The \x escape sequence.

          o+ The systime() and strftime() functions.

          o+ The special file names available for I/O  redirection
            are not recognized.

          o+ The IGNORECASE variable and its side-effects are  not
            available.

          o+ The FIELDWIDTHS variable and fixed width field split-
            ting.

          o+ No path search is performed for files named  via  the
            -f  option.   Therefore the AWKPATH environment vari-
            able is not special.

     The AWK book does not define the return value of the close()
     function.   _G_a_w_k's close() returns the value from _f_c_l_o_s_e(3),
     or _p_c_l_o_s_e(3), when closing a file or pipe, respectively.

     When _g_a_w_k is invoked with the -W compat option,  if  the  _f_s
     argument  to  the -F option is ``t'', then FS will be set to
     the tab character.  Since this  is  a  rather  ugly  special
     case, it is not the default behavior.

BUGS
     The -F option is not necessary given the command line  vari-
     able  assignment feature; it remains only for backwards com-
     patibility.




Free Software FoundationLast change: Jun 5 1991                   18






GAWK(1)                 Utility Commands                  GAWK(1)



VERSION INFORMATION
     This man page documents _g_a_w_k, version 2.13.

     For the 2.13 version of _g_a_w_k, the -c, -V,  -C,  -a,  and  -e
     options  of  the 2.11 version are recognized.  However, _g_a_w_k
     will print a warning message, and these options will go away
     in the 2.14 version.

     The 2.12 version was a  development  version  that  was  not
     officially released.

AUTHORS
     The original version of UNIX _a_w_k  was  designed  and  imple-
     mented  by Alfred Aho, Peter Weinberger, and Brian Kernighan
     of AT&T Bell Labs. Brian Kernighan continues to maintain and
     enhance it.

     Paul Rubin and Jay Fenlason, of the  Free  Software  Founda-
     tion, wrote _g_a_w_k, to be compatible with the original version
     of _a_w_k distributed in Seventh Edition UNIX.  John Woods con-
     tributed  a number of bug fixes.  David Trueman of Dalhousie
     University, with contributions from Arnold Robbins at  Emory
     University  and  AudioFAX, made _g_a_w_k compatible with the new
     version of UNIX _a_w_k.

ACKNOWLEDGEMENTS
     Brian Kernighan of Bell Labs  provided  valuable  assistance
     during testing and debugging.  We thank him.



























Free Software FoundationLast change: Jun 5 1991                   19



